Overview

Dataset statistics

Number of variables 17
Number of observations 47478
Missing cells 0
Missing cells (%) 0.0%
Duplicate rows 0
Duplicate rows (%) 0.0%
Total size in memory 6.2 MiB
Average record size in memory 136.0 B

Variable types

Text 1
Categorical 10
Numeric 5
DateTime 1

Alerts

victim_age is highly overall correlated with age_range High correlation
lat is highly overall correlated with city and 2 other fields High correlation
lon is highly overall correlated with city and 2 other fields High correlation
POPULATION is highly overall correlated with city and 2 other fields High correlation
age_range is highly overall correlated with victim_age High correlation
reported_month is highly overall correlated with season High correlation
season is highly overall correlated with reported_month High correlation
city is highly overall correlated with lat and 4 other fields High correlation
state is highly overall correlated with lat and 4 other fields High correlation
LOCATION is highly overall correlated with lat and 4 other fields High correlation
uid has unique values Unique

Reproduction

Analysis started 2023-09-17 20:57:35.397245
Analysis finished 2023-09-17 20:57:41.373933
Duration 5.98 seconds
Software version ydata-profiling vv4.3.2
Download configuration config.json

Variables

uid
Text

UNIQUE 

Distinct 47478
Distinct (%) 100.0%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
2023-09-17T15:57:41.504153 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/

Length

Max length 10
Median length 10
Mean length 9.9109482
Min length 9

Characters and Unicode

Total characters 470552
Distinct characters 47
Distinct categories 4 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 47478 ?
Unique (%) 100.0%

Sample

1st row Alb-000001
2nd row Alb-000002
3rd row Alb-000003
4th row Alb-000004
5th row Alb-000005
Value Count Frequency (%)
alb-000001 1
 
< 0.1%
alb-000010 1
 
< 0.1%
alb-000025 1
 
< 0.1%
alb-000024 1
 
< 0.1%
alb-000003 1
 
< 0.1%
alb-000004 1
 
< 0.1%
alb-000005 1
 
< 0.1%
alb-000006 1
 
< 0.1%
alb-000007 1
 
< 0.1%
alb-000008 1
 
< 0.1%
Other values (47468) 47468
> 99.9%
2023-09-17T15:57:41.728500 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
0 137803
29.3%
- 47478
 
10.1%
1 23133
 
4.9%
2 18590
 
4.0%
3 18035
 
3.8%
4 16955
 
3.6%
7 16112
 
3.4%
5 14730
 
3.1%
6 14334
 
3.0%
i 12920
 
2.7%
Other values (37) 150462
32.0%

Most occurring categories

Value Count Frequency (%)
Decimal Number 284868
60.5%
Lowercase Letter 84839
 
18.0%
Uppercase Letter 53367
 
11.3%
Dash Punctuation 47478
 
10.1%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
i 12920
15.2%
a 11031
13.0%
h 9220
10.9%
o 8717
10.3%
l 7490
8.8%
t 6615
7.8%
s 6163
7.3%
u 4839
 
5.7%
e 4798
 
5.7%
n 3066
 
3.6%
Other values (8) 9980
11.8%
Uppercase Letter
Value Count Frequency (%)
C 7945
14.9%
L 6106
11.4%
B 5424
10.2%
S 4912
9.2%
P 3664
 
6.9%
D 3534
 
6.6%
M 3451
 
6.5%
O 3394
 
6.4%
H 2908
 
5.4%
N 2771
 
5.2%
Other values (8) 9258
17.3%
Decimal Number
Value Count Frequency (%)
0 137803
48.4%
1 23133
 
8.1%
2 18590
 
6.5%
3 18035
 
6.3%
4 16955
 
6.0%
7 16112
 
5.7%
5 14730
 
5.2%
6 14334
 
5.0%
8 12854
 
4.5%
9 12322
 
4.3%
Dash Punctuation
Value Count Frequency (%)
- 47478
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 332346
70.6%
Latin 138206
29.4%

Most frequent character per script

Latin
Value Count Frequency (%)
i 12920
 
9.3%
a 11031
 
8.0%
h 9220
 
6.7%
o 8717
 
6.3%
C 7945
 
5.7%
l 7490
 
5.4%
t 6615
 
4.8%
s 6163
 
4.5%
L 6106
 
4.4%
B 5424
 
3.9%
Other values (26) 56575
40.9%
Common
Value Count Frequency (%)
0 137803
41.5%
- 47478
 
14.3%
1 23133
 
7.0%
2 18590
 
5.6%
3 18035
 
5.4%
4 16955
 
5.1%
7 16112
 
4.8%
5 14730
 
4.4%
6 14334
 
4.3%
8 12854
 
3.9%

Most occurring blocks

Value Count Frequency (%)
ASCII 470552
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 137803
29.3%
- 47478
 
10.1%
1 23133
 
4.9%
2 18590
 
4.0%
3 18035
 
3.8%
4 16955
 
3.6%
7 16112
 
3.4%
5 14730
 
3.1%
6 14334
 
3.0%
i 12920
 
2.7%
Other values (37) 150462
32.0%

disposition
Categorical

Distinct 2
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
No Arrest
24258 
Arrest Made
23220 

Length

Max length 11
Median length 9
Mean length 9.9781372
Min length 9

Characters and Unicode

Total characters 473742
Distinct characters 11
Distinct categories 3 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row No Arrest
2nd row Arrest Made
3rd row No Arrest
4th row Arrest Made
5th row No Arrest

Common Values

Value Count Frequency (%)
No Arrest 24258
51.1%
Arrest Made 23220
48.9%

Length

2023-09-17T15:57:41.820407 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-17T15:57:41.902762 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Value Count Frequency (%)
arrest 47478
50.0%
no 24258
25.5%
made 23220
24.5%

Most occurring characters

Value Count Frequency (%)
r 94956
20.0%
e 70698
14.9%
47478
10.0%
A 47478
10.0%
s 47478
10.0%
t 47478
10.0%
N 24258
 
5.1%
o 24258
 
5.1%
M 23220
 
4.9%
a 23220
 
4.9%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 331308
69.9%
Uppercase Letter 94956
 
20.0%
Space Separator 47478
 
10.0%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
r 94956
28.7%
e 70698
21.3%
s 47478
14.3%
t 47478
14.3%
o 24258
 
7.3%
a 23220
 
7.0%
d 23220
 
7.0%
Uppercase Letter
Value Count Frequency (%)
A 47478
50.0%
N 24258
25.5%
M 23220
24.5%
Space Separator
Value Count Frequency (%)
47478
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 426264
90.0%
Common 47478
 
10.0%

Most frequent character per script

Latin
Value Count Frequency (%)
r 94956
22.3%
e 70698
16.6%
A 47478
11.1%
s 47478
11.1%
t 47478
11.1%
N 24258
 
5.7%
o 24258
 
5.7%
M 23220
 
5.4%
a 23220
 
5.4%
d 23220
 
5.4%
Common
Value Count Frequency (%)
47478
100.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 473742
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
r 94956
20.0%
e 70698
14.9%
47478
10.0%
A 47478
10.0%
s 47478
10.0%
t 47478
10.0%
N 24258
 
5.1%
o 24258
 
5.1%
M 23220
 
4.9%
a 23220
 
4.9%

victim_sex
Categorical

Distinct 2
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
Male
40387 
Female
7091 

Length

Max length 6
Median length 4
Mean length 4.2987068
Min length 4

Characters and Unicode

Total characters 204094
Distinct characters 6
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Male
2nd row Male
3rd row Female
4th row Male
5th row Female

Common Values

Value Count Frequency (%)
Male 40387
85.1%
Female 7091
 
14.9%

Length

2023-09-17T15:57:41.968427 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-17T15:57:42.045477 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Value Count Frequency (%)
male 40387
85.1%
female 7091
 
14.9%

Most occurring characters

Value Count Frequency (%)
e 54569
26.7%
a 47478
23.3%
l 47478
23.3%
M 40387
19.8%
F 7091
 
3.5%
m 7091
 
3.5%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 156616
76.7%
Uppercase Letter 47478
 
23.3%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
e 54569
34.8%
a 47478
30.3%
l 47478
30.3%
m 7091
 
4.5%
Uppercase Letter
Value Count Frequency (%)
M 40387
85.1%
F 7091
 
14.9%

Most occurring scripts

Value Count Frequency (%)
Latin 204094
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
e 54569
26.7%
a 47478
23.3%
l 47478
23.3%
M 40387
19.8%
F 7091
 
3.5%
m 7091
 
3.5%

Most occurring blocks

Value Count Frequency (%)
ASCII 204094
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
e 54569
26.7%
a 47478
23.3%
l 47478
23.3%
M 40387
19.8%
F 7091
 
3.5%
m 7091
 
3.5%

victim_race
Categorical

Distinct 5
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
Black
33062 
Hispanic
6817 
White
6259 
Asian
 
676
Other
 
664

Length

Max length 8
Median length 5
Mean length 5.4307469
Min length 5

Characters and Unicode

Total characters 257841
Distinct characters 17
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Hispanic
2nd row Hispanic
3rd row White
4th row Hispanic
5th row White

Common Values

Value Count Frequency (%)
Black 33062
69.6%
Hispanic 6817
 
14.4%
White 6259
 
13.2%
Asian 676
 
1.4%
Other 664
 
1.4%

Length

2023-09-17T15:57:42.108275 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-17T15:57:42.188621 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Value Count Frequency (%)
black 33062
69.6%
hispanic 6817
 
14.4%
white 6259
 
13.2%
asian 676
 
1.4%
other 664
 
1.4%

Most occurring characters

Value Count Frequency (%)
a 40555
15.7%
c 39879
15.5%
B 33062
12.8%
k 33062
12.8%
l 33062
12.8%
i 20569
8.0%
s 7493
 
2.9%
n 7493
 
2.9%
h 6923
 
2.7%
e 6923
 
2.7%
Other values (7) 28820
11.2%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 210363
81.6%
Uppercase Letter 47478
 
18.4%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
a 40555
19.3%
c 39879
19.0%
k 33062
15.7%
l 33062
15.7%
i 20569
9.8%
s 7493
 
3.6%
n 7493
 
3.6%
h 6923
 
3.3%
e 6923
 
3.3%
t 6923
 
3.3%
Other values (2) 7481
 
3.6%
Uppercase Letter
Value Count Frequency (%)
B 33062
69.6%
H 6817
 
14.4%
W 6259
 
13.2%
A 676
 
1.4%
O 664
 
1.4%

Most occurring scripts

Value Count Frequency (%)
Latin 257841
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
a 40555
15.7%
c 39879
15.5%
B 33062
12.8%
k 33062
12.8%
l 33062
12.8%
i 20569
8.0%
s 7493
 
2.9%
n 7493
 
2.9%
h 6923
 
2.7%
e 6923
 
2.7%
Other values (7) 28820
11.2%

Most occurring blocks

Value Count Frequency (%)
ASCII 257841
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
a 40555
15.7%
c 39879
15.5%
B 33062
12.8%
k 33062
12.8%
l 33062
12.8%
i 20569
8.0%
s 7493
 
2.9%
n 7493
 
2.9%
h 6923
 
2.7%
e 6923
 
2.7%
Other values (7) 28820
11.2%

victim_age
Real number (ℝ)

HIGH CORRELATION 

Distinct 101
Distinct (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 31.792325
Minimum 0
Maximum 102
Zeros 371
Zeros (%) 0.8%
Negative 0
Negative (%) 0.0%
Memory size 371.0 KiB
2023-09-17T15:57:42.269772 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 16
Q1 22
median 28
Q3 40
95-th percentile 59
Maximum 102
Range 102
Interquartile range (IQR) 18

Descriptive statistics

Standard deviation 14.404802
Coefficient of variation (CV) 0.45309056
Kurtosis 1.164053
Mean 31.792325
Median Absolute Deviation (MAD) 8
Skewness 0.90045942
Sum 1509436
Variance 207.49833
Monotonicity Not monotonic
2023-09-17T15:57:42.357055 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
22 2054
 
4.3%
21 2023
 
4.3%
23 1981
 
4.2%
24 1930
 
4.1%
19 1902
 
4.0%
20 1849
 
3.9%
25 1835
 
3.9%
26 1736
 
3.7%
18 1592
 
3.4%
27 1523
 
3.2%
Other values (91) 29053
61.2%
Value Count Frequency (%)
0 371
0.8%
1 326
0.7%
2 182
0.4%
3 123
 
0.3%
4 78
 
0.2%
5 49
 
0.1%
6 45
 
0.1%
7 42
 
0.1%
8 29
 
0.1%
9 28
 
0.1%
Value Count Frequency (%)
102 1
 
< 0.1%
101 1
 
< 0.1%
99 1
 
< 0.1%
97 3
 
< 0.1%
96 2
 
< 0.1%
95 4
 
< 0.1%
94 5
< 0.1%
93 6
< 0.1%
92 6
< 0.1%
91 11
< 0.1%

age_range
Categorical

HIGH CORRELATION 

Distinct 5
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
18-29
21428 
30-44
13461 
45-64
7144 
0-17
3916 
65+
 
1529

Length

Max length 5
Median length 5
Mean length 4.8531109
Min length 3

Characters and Unicode

Total characters 230416
Distinct characters 12
Distinct categories 3 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 65+
2nd row 0-17
3rd row 0-17
4th row 30-44
5th row 65+

Common Values

Value Count Frequency (%)
18-29 21428
45.1%
30-44 13461
28.4%
45-64 7144
 
15.0%
0-17 3916
 
8.2%
65+ 1529
 
3.2%

Length

2023-09-17T15:57:42.441062 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-17T15:57:42.521773 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Value Count Frequency (%)
18-29 21428
45.1%
30-44 13461
28.4%
45-64 7144
 
15.0%
0-17 3916
 
8.2%
65 1529
 
3.2%

Most occurring characters

Value Count Frequency (%)
- 45949
19.9%
4 41210
17.9%
1 25344
11.0%
8 21428
9.3%
2 21428
9.3%
9 21428
9.3%
0 17377
 
7.5%
3 13461
 
5.8%
5 8673
 
3.8%
6 8673
 
3.8%
Other values (2) 5445
 
2.4%

Most occurring categories

Value Count Frequency (%)
Decimal Number 182938
79.4%
Dash Punctuation 45949
 
19.9%
Math Symbol 1529
 
0.7%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
4 41210
22.5%
1 25344
13.9%
8 21428
11.7%
2 21428
11.7%
9 21428
11.7%
0 17377
9.5%
3 13461
 
7.4%
5 8673
 
4.7%
6 8673
 
4.7%
7 3916
 
2.1%
Dash Punctuation
Value Count Frequency (%)
- 45949
100.0%
Math Symbol
Value Count Frequency (%)
+ 1529
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 230416
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
- 45949
19.9%
4 41210
17.9%
1 25344
11.0%
8 21428
9.3%
2 21428
9.3%
9 21428
9.3%
0 17377
 
7.5%
3 13461
 
5.8%
5 8673
 
3.8%
6 8673
 
3.8%
Other values (2) 5445
 
2.4%

Most occurring blocks

Value Count Frequency (%)
ASCII 230416
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
- 45949
19.9%
4 41210
17.9%
1 25344
11.0%
8 21428
9.3%
2 21428
9.3%
9 21428
9.3%
0 17377
 
7.5%
3 13461
 
5.8%
5 8673
 
3.8%
6 8673
 
3.8%
Other values (2) 5445
 
2.4%
Distinct 4018
Distinct (%) 8.5%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
Minimum 2007-01-01 00:00:00
Maximum 2017-12-31 00:00:00
2023-09-17T15:57:42.600773 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:42.690042 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

reported_year
Real number (ℝ)

Distinct 11
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 2012.3466
Minimum 2007
Maximum 2017
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 371.0 KiB
2023-09-17T15:57:42.768831 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum 2007
5-th percentile 2007
Q1 2010
median 2012
Q3 2015
95-th percentile 2017
Maximum 2017
Range 10
Interquartile range (IQR) 5

Descriptive statistics

Standard deviation 3.153443
Coefficient of variation (CV) 0.0015670476
Kurtosis -1.2016741
Mean 2012.3466
Median Absolute Deviation (MAD) 3
Skewness -0.15546959
Sum 95542194
Variance 9.9442029
Monotonicity Not monotonic
2023-09-17T15:57:42.832888 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
Value Count Frequency (%)
2016 5794
12.2%
2015 4927
10.4%
2017 4517
9.5%
2012 4483
9.4%
2014 4248
8.9%
2010 4238
8.9%
2013 4199
8.8%
2011 4140
8.7%
2007 3807
8.0%
2008 3753
7.9%
Value Count Frequency (%)
2007 3807
8.0%
2008 3753
7.9%
2009 3372
7.1%
2010 4238
8.9%
2011 4140
8.7%
2012 4483
9.4%
2013 4199
8.8%
2014 4248
8.9%
2015 4927
10.4%
2016 5794
12.2%
Value Count Frequency (%)
2017 4517
9.5%
2016 5794
12.2%
2015 4927
10.4%
2014 4248
8.9%
2013 4199
8.8%
2012 4483
9.4%
2011 4140
8.7%
2010 4238
8.9%
2009 3372
7.1%
2008 3753
7.9%

reported_month
Categorical

HIGH CORRELATION 

Distinct 12
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
July
4624 
August
4360 
June
4273 
May
4210 
September
4131 
Other values (7)
25880 

Length

Max length 9
Median length 7
Mean length 6.0797001
Min length 3

Characters and Unicode

Total characters 288652
Distinct characters 26
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row May
2nd row February
3rd row June
4th row January
5th row January

Common Values

Value Count Frequency (%)
July 4624
9.7%
August 4360
9.2%
June 4273
9.0%
May 4210
8.9%
September 4131
8.7%
October 4060
8.6%
December 3890
8.2%
November 3886
8.2%
April 3779
8.0%
January 3682
7.8%
Other values (2) 6583
13.9%

Length

2023-09-17T15:57:42.906584 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
july 4624
9.7%
august 4360
9.2%
june 4273
9.0%
may 4210
8.9%
september 4131
8.7%
october 4060
8.6%
december 3890
8.2%
november 3886
8.2%
april 3779
8.0%
january 3682
7.8%
Other values (2) 6583
13.9%

Most occurring characters

Value Count Frequency (%)
e 43129
14.9%
r 32972
 
11.4%
u 24260
 
8.4%
b 18928
 
6.6%
a 18157
 
6.3%
y 15477
 
5.4%
J 12579
 
4.4%
t 12551
 
4.3%
m 11907
 
4.1%
c 11572
 
4.0%
Other values (16) 87120
30.2%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 241174
83.6%
Uppercase Letter 47478
 
16.4%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
e 43129
17.9%
r 32972
13.7%
u 24260
10.1%
b 18928
7.8%
a 18157
7.5%
y 15477
 
6.4%
t 12551
 
5.2%
m 11907
 
4.9%
c 11572
 
4.8%
l 8403
 
3.5%
Other values (8) 43818
18.2%
Uppercase Letter
Value Count Frequency (%)
J 12579
26.5%
A 8139
17.1%
M 7832
16.5%
S 4131
 
8.7%
O 4060
 
8.6%
D 3890
 
8.2%
N 3886
 
8.2%
F 2961
 
6.2%

Most occurring scripts

Value Count Frequency (%)
Latin 288652
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
e 43129
14.9%
r 32972
 
11.4%
u 24260
 
8.4%
b 18928
 
6.6%
a 18157
 
6.3%
y 15477
 
5.4%
J 12579
 
4.4%
t 12551
 
4.3%
m 11907
 
4.1%
c 11572
 
4.0%
Other values (16) 87120
30.2%

Most occurring blocks

Value Count Frequency (%)
ASCII 288652
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
e 43129
14.9%
r 32972
 
11.4%
u 24260
 
8.4%
b 18928
 
6.6%
a 18157
 
6.3%
y 15477
 
5.4%
J 12579
 
4.4%
t 12551
 
4.3%
m 11907
 
4.1%
c 11572
 
4.0%
Other values (16) 87120
30.2%

reported_weekday
Categorical

Distinct 7
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
Sunday
7850 
Saturday
7619 
Monday
6853 
Friday
6446 
Tuesday
6331 
Other values (2)
12379 

Length

Max length 9
Median length 8
Mean length 7.1075235
Min length 6

Characters and Unicode

Total characters 337451
Distinct characters 17
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Tuesday
2nd row Tuesday
3rd row Tuesday
4th row Friday
5th row Saturday

Common Values

Value Count Frequency (%)
Sunday 7850
16.5%
Saturday 7619
16.0%
Monday 6853
14.4%
Friday 6446
13.6%
Tuesday 6331
13.3%
Wednesday 6256
13.2%
Thursday 6123
12.9%

Length

2023-09-17T15:57:42.980381 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-17T15:57:43.067254 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Value Count Frequency (%)
sunday 7850
16.5%
saturday 7619
16.0%
monday 6853
14.4%
friday 6446
13.6%
tuesday 6331
13.3%
wednesday 6256
13.2%
thursday 6123
12.9%

Most occurring characters

Value Count Frequency (%)
a 55097
16.3%
d 53734
15.9%
y 47478
14.1%
u 27923
8.3%
n 20959
 
6.2%
r 20188
 
6.0%
e 18843
 
5.6%
s 18710
 
5.5%
S 15469
 
4.6%
T 12454
 
3.7%
Other values (7) 46596
13.8%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 289973
85.9%
Uppercase Letter 47478
 
14.1%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
a 55097
19.0%
d 53734
18.5%
y 47478
16.4%
u 27923
9.6%
n 20959
 
7.2%
r 20188
 
7.0%
e 18843
 
6.5%
s 18710
 
6.5%
t 7619
 
2.6%
o 6853
 
2.4%
Other values (2) 12569
 
4.3%
Uppercase Letter
Value Count Frequency (%)
S 15469
32.6%
T 12454
26.2%
M 6853
14.4%
F 6446
13.6%
W 6256
13.2%

Most occurring scripts

Value Count Frequency (%)
Latin 337451
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
a 55097
16.3%
d 53734
15.9%
y 47478
14.1%
u 27923
8.3%
n 20959
 
6.2%
r 20188
 
6.0%
e 18843
 
5.6%
s 18710
 
5.5%
S 15469
 
4.6%
T 12454
 
3.7%
Other values (7) 46596
13.8%

Most occurring blocks

Value Count Frequency (%)
ASCII 337451
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
a 55097
16.3%
d 53734
15.9%
y 47478
14.1%
u 27923
8.3%
n 20959
 
6.2%
r 20188
 
6.0%
e 18843
 
5.6%
s 18710
 
5.5%
S 15469
 
4.6%
T 12454
 
3.7%
Other values (7) 46596
13.8%

season
Categorical

HIGH CORRELATION 

Distinct 4
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
Summer
13257 
Fall
12077 
Spring
11611 
Winter
10533 

Length

Max length 6
Median length 6
Mean length 5.4912591
Min length 4

Characters and Unicode

Total characters 260714
Distinct characters 14
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Spring
2nd row Winter
3rd row Summer
4th row Winter
5th row Winter

Common Values

Value Count Frequency (%)
Summer 13257
27.9%
Fall 12077
25.4%
Spring 11611
24.5%
Winter 10533
22.2%

Length

2023-09-17T15:57:43.159928 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-17T15:57:43.246826 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Value Count Frequency (%)
summer 13257
27.9%
fall 12077
25.4%
spring 11611
24.5%
winter 10533
22.2%

Most occurring characters

Value Count Frequency (%)
r 35401
13.6%
m 26514
10.2%
S 24868
9.5%
l 24154
9.3%
e 23790
9.1%
i 22144
8.5%
n 22144
8.5%
u 13257
 
5.1%
F 12077
 
4.6%
a 12077
 
4.6%
Other values (4) 44288
17.0%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 213236
81.8%
Uppercase Letter 47478
 
18.2%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
r 35401
16.6%
m 26514
12.4%
l 24154
11.3%
e 23790
11.2%
i 22144
10.4%
n 22144
10.4%
u 13257
 
6.2%
a 12077
 
5.7%
p 11611
 
5.4%
g 11611
 
5.4%
Uppercase Letter
Value Count Frequency (%)
S 24868
52.4%
F 12077
25.4%
W 10533
22.2%

Most occurring scripts

Value Count Frequency (%)
Latin 260714
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
r 35401
13.6%
m 26514
10.2%
S 24868
9.5%
l 24154
9.3%
e 23790
9.1%
i 22144
8.5%
n 22144
8.5%
u 13257
 
5.1%
F 12077
 
4.6%
a 12077
 
4.6%
Other values (4) 44288
17.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 260714
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
r 35401
13.6%
m 26514
10.2%
S 24868
9.5%
l 24154
9.3%
e 23790
9.1%
i 22144
8.5%
n 22144
8.5%
u 13257
 
5.1%
F 12077
 
4.6%
a 12077
 
4.6%
Other values (4) 44288
17.0%

city
Categorical

HIGH CORRELATION 

Distinct 47
Distinct (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
Chicago
5523 
Philadelphia
 
3036
Houston
 
2908
Baltimore
 
2827
Detroit
 
2496
Other values (42)
30688 

Length

Max length 14
Median length 12
Mean length 8.9042715
Min length 5

Characters and Unicode

Total characters 422757
Distinct characters 44
Distinct categories 4 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Albuquerque
2nd row Albuquerque
3rd row Albuquerque
4th row Albuquerque
5th row Albuquerque

Common Values

Value Count Frequency (%)
Chicago 5523
 
11.6%
Philadelphia 3036
 
6.4%
Houston 2908
 
6.1%
Baltimore 2827
 
6.0%
Detroit 2496
 
5.3%
Los Angeles 2196
 
4.6%
St. Louis 1661
 
3.5%
Memphis 1510
 
3.2%
New Orleans 1394
 
2.9%
Indianapolis 1321
 
2.8%
Other values (37) 22606
47.6%

Length

2023-09-17T15:57:43.318427 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
chicago 5523
 
9.4%
philadelphia 3036
 
5.2%
houston 2908
 
4.9%
baltimore 2827
 
4.8%
detroit 2496
 
4.2%
san 2212
 
3.8%
los 2196
 
3.7%
angeles 2196
 
3.7%
new 2016
 
3.4%
st 1661
 
2.8%
Other values (46) 31794
54.0%

Most occurring characters

Value Count Frequency (%)
a 41419
 
9.8%
o 37364
 
8.8%
i 37308
 
8.8%
e 28270
 
6.7%
n 27262
 
6.4%
l 25909
 
6.1%
s 23953
 
5.7%
t 23753
 
5.6%
h 20046
 
4.7%
r 13444
 
3.2%
Other values (34) 144029
34.1%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 350844
83.0%
Uppercase Letter 58865
 
13.9%
Space Separator 11387
 
2.7%
Other Punctuation 1661
 
0.4%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
a 41419
11.8%
o 37364
10.6%
i 37308
10.6%
e 28270
 
8.1%
n 27262
 
7.8%
l 25909
 
7.4%
s 23953
 
6.8%
t 23753
 
6.8%
h 20046
 
5.7%
r 13444
 
3.8%
Other values (13) 72116
20.6%
Uppercase Letter
Value Count Frequency (%)
C 8598
14.6%
L 6106
10.4%
B 5802
9.9%
S 4912
8.3%
A 4273
 
7.3%
P 3664
 
6.2%
D 3534
 
6.0%
M 3451
 
5.9%
O 3394
 
5.8%
H 2908
 
4.9%
Other values (9) 12223
20.8%
Space Separator
Value Count Frequency (%)
11387
100.0%
Other Punctuation
Value Count Frequency (%)
. 1661
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 409709
96.9%
Common 13048
 
3.1%

Most frequent character per script

Latin
Value Count Frequency (%)
a 41419
 
10.1%
o 37364
 
9.1%
i 37308
 
9.1%
e 28270
 
6.9%
n 27262
 
6.7%
l 25909
 
6.3%
s 23953
 
5.8%
t 23753
 
5.8%
h 20046
 
4.9%
r 13444
 
3.3%
Other values (32) 130981
32.0%
Common
Value Count Frequency (%)
11387
87.3%
. 1661
 
12.7%

Most occurring blocks

Value Count Frequency (%)
ASCII 422757
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
a 41419
 
9.8%
o 37364
 
8.8%
i 37308
 
8.8%
e 28270
 
6.7%
n 27262
 
6.4%
l 25909
 
6.1%
s 23953
 
5.7%
t 23753
 
5.6%
h 20046
 
4.7%
r 13444
 
3.2%
Other values (34) 144029
34.1%

state
Categorical

HIGH CORRELATION 

Distinct 27
Distinct (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
CA
6195 
IL
5523 
TX
4282 
PA
3664 
MD
2827 
Other values (22)
24987 

Length

Max length 2
Median length 2
Mean length 2
Min length 2

Characters and Unicode

Total characters 94956
Distinct characters 19
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row NM
2nd row NM
3rd row NM
4th row NM
5th row NM

Common Values

Value Count Frequency (%)
CA 6195
13.0%
IL 5523
 
11.6%
TX 4282
 
9.0%
PA 3664
 
7.7%
MD 2827
 
6.0%
MI 2496
 
5.3%
TN 2265
 
4.8%
LA 1817
 
3.8%
FL 1811
 
3.8%
OH 1761
 
3.7%
Other values (17) 14837
31.3%

Length

2023-09-17T15:57:43.388661 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
ca 6195
13.0%
il 5523
 
11.6%
tx 4282
 
9.0%
pa 3664
 
7.7%
md 2827
 
6.0%
mi 2496
 
5.3%
tn 2265
 
4.8%
la 1817
 
3.8%
fl 1811
 
3.8%
oh 1761
 
3.7%
Other values (17) 14837
31.3%

Most occurring characters

Value Count Frequency (%)
A 14581
15.4%
I 10455
11.0%
L 9937
10.5%
C 8752
9.2%
M 8237
8.7%
N 8004
8.4%
T 6547
6.9%
O 4959
 
5.2%
X 4282
 
4.5%
D 4135
 
4.4%
Other values (9) 15067
15.9%

Most occurring categories

Value Count Frequency (%)
Uppercase Letter 94956
100.0%

Most frequent character per category

Uppercase Letter
Value Count Frequency (%)
A 14581
15.4%
I 10455
11.0%
L 9937
10.5%
C 8752
9.2%
M 8237
8.7%
N 8004
8.4%
T 6547
6.9%
O 4959
 
5.2%
X 4282
 
4.5%
D 4135
 
4.4%
Other values (9) 15067
15.9%

Most occurring scripts

Value Count Frequency (%)
Latin 94956
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
A 14581
15.4%
I 10455
11.0%
L 9937
10.5%
C 8752
9.2%
M 8237
8.7%
N 8004
8.4%
T 6547
6.9%
O 4959
 
5.2%
X 4282
 
4.5%
D 4135
 
4.4%
Other values (9) 15067
15.9%

Most occurring blocks

Value Count Frequency (%)
ASCII 94956
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
A 14581
15.4%
I 10455
11.0%
L 9937
10.5%
C 8752
9.2%
M 8237
8.7%
N 8004
8.4%
T 6547
6.9%
O 4959
 
5.2%
X 4282
 
4.5%
D 4135
 
4.4%
Other values (9) 15067
15.9%

lat
Real number (ℝ)

HIGH CORRELATION 

Distinct 40953
Distinct (%) 86.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 37.269773
Minimum 25.725214
Maximum 45.05119
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 371.0 KiB
2023-09-17T15:57:43.467981 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum 25.725214
5-th percentile 29.685616
Q1 34.0278
median 38.661111
Q3 40.459645
95-th percentile 42.431242
Maximum 45.05119
Range 19.325976
Interquartile range (IQR) 6.4318452

Descriptive statistics

Standard deviation 4.3377485
Coefficient of variation (CV) 0.11638784
Kurtosis -0.68894128
Mean 37.269773
Median Absolute Deviation (MAD) 3.1492084
Skewness -0.5834027
Sum 1769494.3
Variance 18.816062
Monotonicity Not monotonic
2023-09-17T15:57:43.555610 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
30.302565 15
 
< 0.1%
37.7503065 14
 
< 0.1%
34.075761 14
 
< 0.1%
38.8741662 12
 
< 0.1%
34.1016 12
 
< 0.1%
37.506284 11
 
< 0.1%
33.9456 11
 
< 0.1%
34.2085 11
 
< 0.1%
41.8645202 10
 
< 0.1%
41.794929 10
 
< 0.1%
Other values (40943) 47358
99.7%
Value Count Frequency (%)
25.7252139 1
< 0.1%
25.7262775 1
< 0.1%
25.7273453 1
< 0.1%
25.7280792 1
< 0.1%
25.7305986 1
< 0.1%
25.7310598 1
< 0.1%
25.7328853 1
< 0.1%
25.7398679 2
< 0.1%
25.7400967 1
< 0.1%
25.7463239 1
< 0.1%
Value Count Frequency (%)
45.05119 1
< 0.1%
45.05052 1
< 0.1%
45.04835 1
< 0.1%
45.04752 1
< 0.1%
45.04471 1
< 0.1%
45.04333 1
< 0.1%
45.04293 1
< 0.1%
45.04223 1
< 0.1%
45.04155 2
< 0.1%
45.03765 2
< 0.1%

lon
Real number (ℝ)

HIGH CORRELATION 

Distinct 40640
Distinct (%) 85.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean -90.846625
Minimum -122.50778
Maximum -71.011519
Zeros 0
Zeros (%) 0.0%
Negative 47478
Negative (%) 100.0%
Memory size 371.0 KiB
2023-09-17T15:57:43.649866 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum -122.50778
5-th percentile -121.26209
Q1 -95.501941
median -87.653205
Q3 -81.649653
95-th percentile -75.149038
Maximum -71.011519
Range 51.49626
Interquartile range (IQR) 13.852288

Descriptive statistics

Standard deviation 13.903806
Coefficient of variation (CV) -0.15304703
Kurtosis 0.089284055
Mean -90.846625
Median Absolute Deviation (MAD) 7.6327799
Skewness -1.0492842
Sum -4313216.1
Variance 193.31582
Monotonicity Not monotonic
2023-09-17T15:57:43.870759 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
-118.2739 62
 
0.1%
-118.2827 31
 
0.1%
-118.309 27
 
0.1%
-118.2783 25
 
0.1%
-118.3089 21
 
< 0.1%
-118.2695 20
 
< 0.1%
-118.2893 16
 
< 0.1%
-118.2651 16
 
< 0.1%
-81.732164 15
 
< 0.1%
-118.2871 15
 
< 0.1%
Other values (40630) 47230
99.5%
Value Count Frequency (%)
-122.507779 1
< 0.1%
-122.5046043 1
< 0.1%
-122.5032863 1
< 0.1%
-122.4920669 1
< 0.1%
-122.4884236 1
< 0.1%
-122.487628 1
< 0.1%
-122.4854123 1
< 0.1%
-122.484382 1
< 0.1%
-122.4842604 1
< 0.1%
-122.4826038 1
< 0.1%
Value Count Frequency (%)
-71.0115188 1
< 0.1%
-71.0123576 1
< 0.1%
-71.0192427 1
< 0.1%
-71.0306083 1
< 0.1%
-71.0312049 1
< 0.1%
-71.0319745 1
< 0.1%
-71.0325517 1
< 0.1%
-71.0329989 1
< 0.1%
-71.034274 1
< 0.1%
-71.034615 1
< 0.1%

LOCATION
Categorical

HIGH CORRELATION 

Distinct 47
Distinct (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
Chicago, IL
5523 
Philadelphia, PA
 
3036
Houston, TX
 
2908
Baltimore, MD
 
2827
Detroit, MI
 
2496
Other values (42)
30688 

Length

Max length 18
Median length 16
Mean length 12.904271
Min length 9

Characters and Unicode

Total characters 612669
Distinct characters 49
Distinct categories 4 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Albuquerque, NM
2nd row Albuquerque, NM
3rd row Albuquerque, NM
4th row Albuquerque, NM
5th row Albuquerque, NM

Common Values

Value Count Frequency (%)
Chicago, IL 5523
 
11.6%
Philadelphia, PA 3036
 
6.4%
Houston, TX 2908
 
6.1%
Baltimore, MD 2827
 
6.0%
Detroit, MI 2496
 
5.3%
Los Angeles, CA 2196
 
4.6%
St. Louis, MO 1661
 
3.5%
Memphis, TN 1510
 
3.2%
New Orleans, LA 1394
 
2.9%
Indianapolis, IN 1321
 
2.8%
Other values (37) 22606
47.6%

Length

2023-09-17T15:57:43.954783 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
ca 6195
 
5.8%
il 5523
 
5.2%
chicago 5523
 
5.2%
tx 4282
 
4.0%
pa 3664
 
3.4%
philadelphia 3036
 
2.9%
houston 2908
 
2.7%
baltimore 2827
 
2.7%
md 2827
 
2.7%
detroit 2496
 
2.3%
Other values (73) 67062
63.1%

Most occurring characters

Value Count Frequency (%)
58865
 
9.6%
, 47478
 
7.7%
a 41419
 
6.8%
o 37364
 
6.1%
i 37308
 
6.1%
e 28270
 
4.6%
n 27262
 
4.4%
l 25909
 
4.2%
s 23953
 
3.9%
t 23753
 
3.9%
Other values (39) 261088
42.6%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 350844
57.3%
Uppercase Letter 153821
25.1%
Space Separator 58865
 
9.6%
Other Punctuation 49139
 
8.0%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
a 41419
11.8%
o 37364
10.6%
i 37308
10.6%
e 28270
 
8.1%
n 27262
 
7.8%
l 25909
 
7.4%
s 23953
 
6.8%
t 23753
 
6.8%
h 20046
 
5.7%
r 13444
 
3.8%
Other values (13) 72116
20.6%
Uppercase Letter
Value Count Frequency (%)
A 18853
12.3%
C 17350
11.3%
L 16042
10.4%
I 11776
 
7.7%
M 11688
 
7.6%
N 10775
 
7.0%
O 8354
 
5.4%
D 7669
 
5.0%
P 7328
 
4.8%
T 7318
 
4.8%
Other values (13) 36668
23.8%
Other Punctuation
Value Count Frequency (%)
, 47478
96.6%
. 1661
 
3.4%
Space Separator
Value Count Frequency (%)
58865
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 504665
82.4%
Common 108004
 
17.6%

Most frequent character per script

Latin
Value Count Frequency (%)
a 41419
 
8.2%
o 37364
 
7.4%
i 37308
 
7.4%
e 28270
 
5.6%
n 27262
 
5.4%
l 25909
 
5.1%
s 23953
 
4.7%
t 23753
 
4.7%
h 20046
 
4.0%
A 18853
 
3.7%
Other values (36) 220528
43.7%
Common
Value Count Frequency (%)
58865
54.5%
, 47478
44.0%
. 1661
 
1.5%

Most occurring blocks

Value Count Frequency (%)
ASCII 612669
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
58865
 
9.6%
, 47478
 
7.7%
a 41419
 
6.8%
o 37364
 
6.1%
i 37308
 
6.1%
e 28270
 
4.6%
n 27262
 
4.4%
l 25909
 
4.2%
s 23953
 
3.9%
t 23753
 
3.9%
Other values (39) 261088
42.6%

POPULATION
Real number (ℝ)

HIGH CORRELATION 

Distinct 469
Distinct (%) 1.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 1389867.1
Minimum 133452
Maximum 19636391
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 371.0 KiB
2023-09-17T15:57:44.037098 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum 133452
5-th percentile 259604
Q1 470788
median 675031
Q3 1565949
95-th percentile 3818812
Maximum 19636391
Range 19502939
Interquartile range (IQR) 1095161

Descriptive statistics

Standard deviation 2320041.6
Coefficient of variation (CV) 1.6692543
Kurtosis 47.004354
Mean 1389867.1
Median Absolute Deviation (MAD) 283212
Skewness 6.3978381
Sum 6.5988109 × 1010
Variance 5.3825931 × 1012
Monotonicity Not monotonic
2023-09-17T15:57:44.128619 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
2716297 764
 
1.6%
2710456 653
 
1.4%
2697359 510
 
1.1%
2719039 504
 
1.1%
2724006 481
 
1.0%
2697006 457
 
1.0%
2703991 445
 
0.9%
2708067 435
 
0.9%
2695598 434
 
0.9%
2725575 423
 
0.9%
Other values (459) 42372
89.2%
Value Count Frequency (%)
133452 12
 
< 0.1%
133651 19
< 0.1%
135734 20
< 0.1%
136286 15
< 0.1%
140443 23
< 0.1%
142377 18
< 0.1%
142683 20
< 0.1%
143991 25
0.1%
145312 37
0.1%
146048 4
 
< 0.1%
Value Count Frequency (%)
19636391 333
0.7%
19593849 289
0.6%
3975067 277
0.6%
3957520 290
0.6%
3933644 276
0.6%
3904102 251
0.5%
3877721 246
0.5%
3847857 291
0.6%
3818812 284
0.6%
3792621 281
0.6%

Interactions

2023-09-17T15:57:40.367201 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:38.604344 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:39.081771 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:39.528018 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:39.957639 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:40.450837 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:38.700301 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:39.161340 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:39.609406 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:40.036602 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:40.538262 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:38.784899 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:39.231748 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:39.691126 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:40.116082 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:40.632590 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:38.884871 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:39.322512 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:39.778729 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:40.203164 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:40.812039 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:38.984050 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:39.438720 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:39.864733 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-17T15:57:40.278885 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/

Correlations

2023-09-17T15:57:44.230360 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
victim_age reported_year lat lon POPULATION disposition victim_sex victim_race age_range reported_month reported_weekday season city state LOCATION
victim_age 1.000 0.053 -0.074 -0.038 -0.007 0.128 0.190 0.129 0.837 0.005 0.022 0.009 0.064 0.058 0.064
reported_year 0.053 1.000 0.006 -0.011 0.080 0.104 0.014 0.021 0.026 0.014 0.018 0.018 0.136 0.108 0.136
lat -0.074 0.006 1.000 0.420 0.148 0.166 0.073 0.165 0.050 0.018 0.027 0.025 0.979 0.748 0.979
lon -0.038 -0.011 0.420 1.000 0.012 0.094 0.082 0.236 0.058 0.013 0.028 0.017 1.000 0.882 1.000
POPULATION -0.007 0.080 0.148 0.012 1.000 0.127 0.035 0.155 0.029 0.011 0.030 0.010 0.984 0.783 0.984
disposition 0.128 0.104 0.166 0.094 0.127 1.000 0.102 0.112 0.104 0.056 0.000 0.032 0.240 0.224 0.240
victim_sex 0.190 0.014 0.073 0.082 0.035 0.102 1.000 0.163 0.150 0.017 0.020 0.018 0.111 0.105 0.111
victim_race 0.129 0.021 0.165 0.236 0.155 0.112 0.163 1.000 0.123 0.012 0.025 0.013 0.305 0.268 0.305
age_range 0.837 0.026 0.050 0.058 0.029 0.104 0.150 0.123 1.000 0.011 0.023 0.011 0.079 0.073 0.079
reported_month 0.005 0.014 0.018 0.013 0.011 0.056 0.017 0.012 0.011 1.000 0.008 1.000 0.022 0.020 0.022
reported_weekday 0.022 0.018 0.027 0.028 0.030 0.000 0.020 0.025 0.023 0.008 1.000 0.008 0.041 0.033 0.041
season 0.009 0.018 0.025 0.017 0.010 0.032 0.018 0.013 0.011 1.000 0.008 1.000 0.034 0.030 0.034
city 0.064 0.136 0.979 1.000 0.984 0.240 0.111 0.305 0.079 0.022 0.041 0.034 1.000 1.000 1.000
state 0.058 0.108 0.748 0.882 0.783 0.224 0.105 0.268 0.073 0.020 0.033 0.030 1.000 1.000 1.000
LOCATION 0.064 0.136 0.979 1.000 0.984 0.240 0.111 0.305 0.079 0.022 0.041 0.034 1.000 1.000 1.000

Missing values

2023-09-17T15:57:40.982968 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
A simple visualization of nullity by column.
2023-09-17T15:57:41.215027 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

uid disposition victim_sex victim_race victim_age age_range reported_date reported_year reported_month reported_weekday season city state lat lon LOCATION POPULATION
0 Alb-000001 No Arrest Male Hispanic 78 65+ 2010-05-04 2010 May Tuesday Spring Albuquerque NM 35.095788 -106.538555 Albuquerque, NM 545852
1 Alb-000002 Arrest Made Male Hispanic 17 0-17 2010-02-16 2010 February Tuesday Winter Albuquerque NM 35.056810 -106.715321 Albuquerque, NM 545852
2 Alb-000003 No Arrest Female White 15 0-17 2010-06-01 2010 June Tuesday Summer Albuquerque NM 35.086092 -106.695568 Albuquerque, NM 545852
3 Alb-000004 Arrest Made Male Hispanic 32 30-44 2010-01-01 2010 January Friday Winter Albuquerque NM 35.078493 -106.556094 Albuquerque, NM 545852
4 Alb-000005 No Arrest Female White 72 65+ 2010-01-02 2010 January Saturday Winter Albuquerque NM 35.130357 -106.580986 Albuquerque, NM 545852
5 Alb-000006 No Arrest Female White 91 65+ 2010-01-26 2010 January Tuesday Winter Albuquerque NM 35.151110 -106.537797 Albuquerque, NM 545852
6 Alb-000007 Arrest Made Male Hispanic 52 45-64 2010-01-27 2010 January Wednesday Winter Albuquerque NM 35.111785 -106.712614 Albuquerque, NM 545852
7 Alb-000008 Arrest Made Female Hispanic 52 45-64 2010-01-27 2010 January Wednesday Winter Albuquerque NM 35.111785 -106.712614 Albuquerque, NM 545852
8 Alb-000009 No Arrest Male White 56 45-64 2010-01-30 2010 January Saturday Winter Albuquerque NM 35.075380 -106.553458 Albuquerque, NM 545852
9 Alb-000010 No Arrest Male Hispanic 43 30-44 2010-02-10 2010 February Wednesday Winter Albuquerque NM 35.065930 -106.572288 Albuquerque, NM 545852
uid disposition victim_sex victim_race victim_age age_range reported_date reported_year reported_month reported_weekday season city state lat lon LOCATION POPULATION
47468 Was-001375 Arrest Made Male Black 22 18-29 2016-02-24 2016 February Wednesday Winter Washington DC 38.843399 -77.000104 Washington, DC 687576
47469 Was-001376 No Arrest Male Black 25 18-29 2016-07-31 2016 July Sunday Summer Washington DC 38.863322 -76.995309 Washington, DC 687576
47470 Was-001377 Arrest Made Male Black 35 30-44 2016-09-16 2016 September Friday Fall Washington DC 38.845871 -76.998169 Washington, DC 687576
47471 Was-001378 Arrest Made Male Black 37 30-44 2016-04-15 2016 April Friday Spring Washington DC 38.826458 -77.003590 Washington, DC 687576
47472 Was-001379 No Arrest Male Black 20 18-29 2016-07-15 2016 July Friday Summer Washington DC 38.827266 -77.001572 Washington, DC 687576
47473 Was-001380 Arrest Made Male Black 29 18-29 2016-09-08 2016 September Thursday Fall Washington DC 38.828704 -77.002075 Washington, DC 687576
47474 Was-001381 No Arrest Male Black 19 18-29 2016-09-13 2016 September Tuesday Fall Washington DC 38.822852 -77.001725 Washington, DC 687576
47475 Was-001382 No Arrest Male Black 23 18-29 2016-11-14 2016 November Monday Fall Washington DC 38.828025 -77.002511 Washington, DC 687576
47476 Was-001383 No Arrest Male Black 24 18-29 2016-11-30 2016 November Wednesday Fall Washington DC 38.820476 -77.008640 Washington, DC 687576
47477 Was-001384 Arrest Made Male Black 17 0-17 2016-09-01 2016 September Thursday Fall Washington DC 38.866689 -76.982409 Washington, DC 687576